Skip to content

fix: D3D12/Vulkan software rendering CI (incl. barrier rewrite and validation fixes)#3122

Merged
xen2 merged 90 commits intomasterfrom
feature/ci-gpu-vulkan
Apr 8, 2026
Merged

fix: D3D12/Vulkan software rendering CI (incl. barrier rewrite and validation fixes)#3122
xen2 merged 90 commits intomasterfrom
feature/ci-gpu-vulkan

Conversation

@xen2
Copy link
Copy Markdown
Member

@xen2 xen2 commented Apr 6, 2026

PR Details

Adds GPU software rendering test infrastructure (D3D12 WARP, Vulkan SwiftShader) to CI, along with a comprehensive rewrite of the D3D12 resource barrier system and numerous graphics API validation fixes across D3D12 and Vulkan, so that all tests pass.

Barrier system rewrite

Separated in specific sub-branch for easier visibility.

  • Cross-platform barrier abstraction (BarrierLayout, BarrierAccess, BarrierSync) shared between D3D12 and Vulkan
  • Per-subresource layout tracking with lazy allocation for cubemaps/arrays
  • D3D12 Enhanced Barriers support (dual-path: legacy + enhanced)
  • Automatic SRV/UAV transitions before Draw/Dispatch via descriptor set tracking
  • Barrier coalescing to eliminate duplicate transitions in a single flush

D3D12 fixes

  • Texture views now redirect barriers to parent texture (fixes depth-stencil state mismatch)
  • SRV textures created in ShaderResource state instead of Common (fixes implicit promotion mismatch)
  • GPU flushed before releasing resources during device teardown (fixes use-after-free crash on WARP)
  • Staging texture temp resource deferred via DeferredReleaseQueue (fixes COM Release crash)
  • Copy fence waited before reading staging texture in staging-to-staging copy (fixes data corruption)
  • Descriptor heap leaks fixed in both CommandList teardown and DescriptorAllocator heap rollover
  • Fence Wait skipped on value zero (eliminates debug layer warning)

Vulkan fixes

  • Correct pipeline stage for HOST access in init barriers
  • Depth aspect auto-detected from format (fixes wrong aspect on depth textures without DepthStencil flag)
  • Correct image view type for 1D/3D textures in render target and depth-stencil views
  • Shared depth texture fallback for shadow maps when atlas is not allocated

CI & test infrastructure

  • D3D12 WARP and Vulkan SwiftShader GPU testing on CI
  • Crash dump collection for all crash types (SEH via FirstChanceException, .NET unhandled via DOTNET_DbgEnableMiniDump, native via WER LocalDumps)
  • AnimatedModelTests camera nudged for deterministic SwiftShader rendering
  • Gold images for WARP, SwiftShader, and NVIDIA

Related Issue

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My change requires a change to the documentation.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • I have built and run the editor to try this change out.

xen2 added 5 commits April 6, 2026 11:29
Add D3D12 to the GPU test matrix in test-windows.yml.
Skip tests with known D3D12 rendering issues:
- Staging texture creation (D3D12 limitation)
- NextGenTest1 (NullReferenceException)
- TestGeometricPrimitives, TestCustomEffect, TestRenderToTexture,
  TesselationTest, SpriteRenderer3DTests (rendering differences)
@xen2 xen2 force-pushed the feature/ci-gpu-vulkan branch from 1db5b85 to b3bb8a2 Compare April 6, 2026 03:23
@Eideren
Copy link
Copy Markdown
Collaborator

Eideren commented Apr 6, 2026

Can't comment on Graphics API stuff, but looks good otherwise.

AnimatedModelTests camera nudged for deterministic SwiftShader rendering

I've seen the change related to that, shouldn't we document this ? Do you know where it comes from, what users might want to watch out for when falling in the same issue ?

You've also introduced a submodule to the repo, shouldn't we mention this in the readme's instructions ?

@xen2 xen2 force-pushed the feature/ci-gpu-vulkan branch 2 times, most recently from ece1b7c to a99a323 Compare April 6, 2026 11:06
@xen2
Copy link
Copy Markdown
Member Author

xen2 commented Apr 6, 2026

AnimatedModelTests camera nudged for deterministic SwiftShader rendering

I've seen the change related to that, shouldn't we document this ? Do you know where it comes from, what users might want to watch out for when falling in the same issue ?

It's a minor difference when rendering with SwiftShader:

  • always the same 1 pixel sometime background sometime mesh, probably with something very close to 50% coverage
  • happening on same computer (I just ran it 10 times, it failed 3 times)
    I am not sure if SwiftShader is supposed to be deterministic, but I had no other issue, except on this very specific test on that very specific pixel.
    I tried to adjust a few settings (force thread count, etc.) but it didn't fix it.
    It was a bit disappointing but it's still much better than anything we had before (GPU/driver-specific gold image, no proper gold image unless ran once for a developer on a random PC on a 100% working commit, etc.)

Anyway, you're right, I should probably write it down somewhere for people writing tests.

You've also introduced a submodule to the repo, shouldn't we mention this in the readme's instructions ?

Good point, will do!
Esp. since it deserves some explanation: the idea is to do dependency building as a github workflow so that it's easy to reproduce and control the dev environment (a bit similar to using a docker).
Do you think it's a good idea?
(note: it's not set in stone, we could always change our approach later -- it's still a much better starting point than me building it on an undocumented computer with specific version of compilers)

As usual, thanks for the feedbacks!

@xen2
Copy link
Copy Markdown
Member Author

xen2 commented Apr 6, 2026

Quick addition on why I used a submodule rather than on main repo:

Using github workflow, if you want to use a workflow_dispatch to trigger it by clicking a button (which I think is the only option, we can't make it build again every commit), it will only appear on Actions list to click if it's merged on the main branch (!)

Let's say, you are working on a feature branch and need a new dll, you can't do it by adding a workflow on Stride repo in that branch as you can't run this workflow (unless you do it in master).

It's a bit unfortunate but I couldn't figure out a workaround, and that's why I came up with a separate deps repository where we can more easily build/push the deps on master and build them (this can be done in a branch once the workflow is in master) then merge the branch with the dll.

We can still give it a try on Stride directly to avoid submodules.
Or use NuGet packages to distribute/consume.

However, we will likely need submodule for the SPIR-V branch anyway (unless I also copy/subtree it on our repo), as it needs CppNet8, SpirvHeaders and SpirvRegistry.

What do you think, should we avoid them? or embrace them?
I had horror stories with them when we were using them very actively for some sub-part of our engine (i.e. we used them for all large binaries in the past before git LFS). But in this case, those submodules won't change much and/or are external, so I thought it would be manageable.

@xen2
Copy link
Copy Markdown
Member Author

xen2 commented Apr 6, 2026

Actually, let me try to put everything back on main repo
I will use a workflow that will generate a nuget package
I suppose it's not that big of a deal if we need to just commit that workflow (or at least a skeleton of it) on master before we can use it in branches.

xen2 added 18 commits April 6, 2026 22:08
…ixes Vulkan drawing order 1st frame for BillboardModeTests)
…promote D3D12 SpriteRenderer3DTests gold image
@Kryptos-FR
Copy link
Copy Markdown
Member

Kryptos-FR commented Apr 6, 2026

I had horror stories with them when we were using them very actively for some sub-part of our engine

I remember that. I still have PTSD 😅.

I will use a workflow that will generate a nuget package

That seems more manageable if you can make it work.

Side note: after merging #3115, I think we still have some workflows that have a trigger on push (while filtering some paths). Now that the main CI is triggered all the time, we should remove it from sub-workflows. Except maybe buid-android until it is fixed.

@xen2 xen2 force-pushed the feature/ci-gpu-vulkan branch 3 times, most recently from ace59ec to 8b4a5d4 Compare April 7, 2026 04:07
xen2 added 4 commits April 7, 2026 16:17
Track which command list last recorded a barrier for each resource
(LastBarrierCommandListId). When a resource is first used on a different
command list, re-issue the barrier even if the layout matches, so the
new command buffer has the transition recorded for Vulkan validation.

Fixes shadow map atlas validation errors in multi-threaded rendering
where worker command buffers sample textures transitioned by the main
command buffer within the same vkQueueSubmit batch.
The checked-in build/sign/sign.exe is a Git LFS file that wasn't
fetched by the sparse checkout, causing "not a valid application"
errors. Replace with dotnet tool install at CI time.
@xen2 xen2 force-pushed the feature/ci-gpu-vulkan branch from de3bbbf to e10ca0d Compare April 7, 2026 08:19
xen2 added 4 commits April 7, 2026 17:39
…o have it during test build (which needs it for CompilerApp)
Tests now use WARP/SwiftShader by default. Set STRIDE_TESTS_GPU=1 to
run on real GPU hardware. This makes Test Explorer and dotnet test
match gold images out of the box without needing a runsettings file.

- Module.cs sets STRIDE_GRAPHICS_SOFTWARE_RENDERING=1 unless
  STRIDE_TESTS_GPU=1 is set
- Flip launch profiles: Software is default, GPU sets STRIDE_TESTS_GPU
- Rename runsettings to GameTests-GPU.runsettings (opt-in to GPU)
- Add tests/GPU-TESTING.md with testing guidelines
@xen2 xen2 force-pushed the feature/ci-gpu-vulkan branch from 6204fd2 to 80cc36b Compare April 8, 2026 00:51
@xen2 xen2 force-pushed the feature/ci-gpu-vulkan branch from f881d2c to 80cc36b Compare April 8, 2026 01:19
@xen2 xen2 merged commit d46db43 into master Apr 8, 2026
23 checks passed
@Eideren Eideren changed the title D3D12/Vulkan software rendering CI (incl. barrier rewrite and validation fixes) fix: D3D12/Vulkan software rendering CI (incl. barrier rewrite and validation fixes) Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants